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\0 (54) Title: BIT RATE CONTROL FOR VIDEO COMPRESSION 
ON 

(57) Abstract: A bit rate control scheme for video encoding is described, in which a target bitrate for a picture frame (or video 
£^ object or macroblock) is determined based ona fluid-flow model of the buffer dynamics, and in which the buffer targetoccupancy 

is set to about 50% of a buffer safety margin used to determinewhether a frame should be skipped or not, the margin being about 

80% of the buffer size. A new Rate -Distortion Model for determining a suitable quantizationparameter to give the target bit rate is 
£^ also described, as is a sliding windowmethod of determining prior data points to update the Rate-Distortion modelparameters, and a 

switching frame- skipping control which switches between a predictive skipping control and a post frame skipping control. 



WO 02/096120 



PCT/SG01/00105 

1 

Bit Rate Control for Video Compression 



Technical Field 

5 The present invention relates to a bit rate control for the compression of video 
data. It has particular, but not exclusive, application to the provision of video 
over a packet switched network such as the Internet. 

1 0 Background Art 

Bit rate control plays an important role in the provision of multimedia over 
communications networks, and has been widely studied by many researchers 
for various standards and applications, such as storage media and real-time 
15 transmission with MPEG-1 and MPEG-2, videoconferencing with H.261 and 
H.263, and video object coding with MPEG-4. 

For different coding standards and applications, different coding parameters are 
emphasised and different mechanisms are applied. For example, in MPEG-2, 

20 the most influential coding parameter with regard to picture quality is the 

quantization parameter (QP) used for texture coding. This parameter can be 
selected for an entire frame of the video sequence or can change from 
macroblock to macroblock. In most implementations, it is selected on the basis 
of buffer fullness, so that the buffer occupancy is maintained at a given level. 

25 The H.263 coding scheme allows for variable frameskip, and due to the low bit- 
rate conditions which may be imposed upon the encoder, it is up to the rate 
control algorithm to make appropriate decisions on both spatial and temporal 
coding parameters. If the buffer is in danger of overflow, complete frames may 
be disregarded at the encoder to allow bits used for the previous frame to be 

30 transmitted out of the buffer to thereby reduce the buffer level and delay. In 
conjunction with this frame-skipping mechanism, the bit rate control algorithm 
must determine a, suitable quantization parameter (QP) to obtain the desired bit 
rate. 
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Similarly to H.263, MPEG-4 bit rate control also considers spatial and temporal 
coding parameters. However, the encoder must also consider the significant 
amount of bits which are used to code shape information such that arbitrarily 
shaped objects can be coded. Also, although each video object may be 
5 encoded at a different frame rate, it is preferable that all of the objects are 
encoded at the same frame rate in order to yield better video quality. Further, 
additional coding parameters are introduced by MPEG-4 to control the amount 
of bits used to specify the shape of an object. It is the responsibility of the rate 
control scheme to incorporate these new parameter decisions along with other 
1 0 parameter decisions to ensure that the video objects are effectively coded. 

In real-time video communications, the encoded bits are placed into an encoder 
buffer before it is transmitted through a network to a decoder. If the actual bit 
rate of the encoder is greater than the available channel bandwidth, the 
additional bits accumulate in the encoder buffer and increase buffer delay, 
which is the time needed to send the buffer bits remaining from the previously 
encoded frames. When the number of bits in the buffer is too high, the encoder 
usually skips some frames to reduce the buffer delay and avoid buffer overflow. 
This frame-skipping, however, produces undesirable motion discontinuity in the 
encoded video sequence. Conversely, if the buffer level is too low, there may 
be periods of time in which no bits are transmitted through the channel, and 
hence some channel bandwidth is wasted. 



15 



20 



To overcome these two problems, a joint buffer control is usually used to 
25 maintain a buffer occupancy of about 50% of the buffer size after coding each 
frame. In order to do this, heuristic methods are usually employed, in which the 
target bit rate is increased if the current buffer level is less than half of the buffer 
size, and the target bit rate is decreased if the current buffer level is more than 
half of the current buffer size. Such schemes are disclosed in "Scalable Rate 
30 Control for MPEG-4 Video", H J. Lee, T.H. Chiang and Y.Q. Zhang, IEEE Trans. 
Circuit Syst. Video Technol., 10:878-894, 2000, and in "MPEG-4 rate control for 
multiple video objects", A. Vetro, H. Sun and Y. Wang, IEEE Trans. Circuit Syst. 
Video Technol., 9:186-199, 1999. These schemes either encode video at a 
predefined fixed rate or at a predefined small set of fixed rates. 



WO 02/096120 



3 



PCT/SG01/00105 

s 



The existing schemes have problems when used in for example Internet 
applications and the streaming of video over the Internet. Due to the 
connectionless nature of the current Internet protocols and the routing 
5 mechanisms involved, the instantaneous bandwidth available to a particular 
user can vary widely in time and cannot in practice be previously known. The 
existing bit rate control schemes cannot adapt themselves quickly enough to the 
variations of channel bandwidth, and are not effective enough to achieve the 
objectives of Video over the Internet. 
10 / 

Tyh aim of the present invention is to provide a bit rate control which provides for 

t ■ * 

better video quality, especially but not exclusively in Internet applications. 

t - ~ 

1 5 Summary of the Invention 

Viewed from a first aspect, the present invention provides a bit rate control 
system for the encoding of video data in which the encoded bits are placed in a 
buffer prior to transmission, and in which a target encoding bit rate is 
20 determined based on the fullness of the buffer, characterized in that the buffer is 
modelled by a fluid-flow traffic model preferably of the form: 

B c (n + 1) = max{0, B c (n) + T(n) - u(n)} 



25 where B c (n) denotes the buffer level at time n; 

T(n) is the actual encoding bit rate; and 
u(n) is the channel output rate. 



The system of the present invention is able to keep the buffer occupancy closer 
30 to its target, which is preferably set at a predefined percentage (preferably 

about 50%) of a safety margin used to determine whether a frame of the video 
sequence to be encoded should be skipped, and to adapt itself faster to the 
variations of the channel bandwidth, and so will skip fewer frames at a low 
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bandwidth. This therefore provides a higher overall video quality, and is 
attractive for video over the Internet. 



Preferably, the target encoding bit rate is given by the equation: 
5 /(») = A + (1 - y ) ~^ + (Y - l)B e (n) 

where A is the channel output rate; 
y is a buffer safety margin; 
B s is the buffer size; 
B c (n) is the current buffer level; and 
10 0 < y < 1 is an adjustable parameter. 



"A" may be equal to the number of bits available for encoding all of the inter- 
frames of a current group of frames being encoded divided by the number of 
inter-frames to be encoded in the current group of frames. Alternatively, when 
1 5 for example providing video over the Internet, "A" may be the actual bandwidth 
estimated by using the packet loss information. This allows the variation of the 
channel bandwidth to be directly incorporated into the buffer control, and allows 
the system to adapt itself in time. 

20 Meanwhile, the target bit rate is preferably modified based on the remaining bits 
available for encoding and on the remaining frames to be encoded. It may thus 
be: 

f(n) = max {(3 * + (1 - (3 ) * f(n), jfi- + H hdr (» - 1)} 

25 

where 0 < p < 1 is an adjustable parameter; 

T r is the number of remaining bits available for encoding; 
N r is the number of frames remaining to be encoded; and 
H h dr(n-1 ) is the amount of overhead bits used for the previous 
30 frame. 
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After the target bit rate is determined, a rate-distortion model preferably of the 
following form is further applied to determine the corresponding quantization 
parameter: 



_ 2 _ 

~ <J cr rr 
2 0^ + Cl Q + 



where R is the total number of bits used to encode a frame; 
Q is the quantization parameter; 
ci and c 2 are first and second order coefficients; 
10 a is an index of video coding complexity; and , 

Hhdr is the amount of overhead bits used. 

Further preferably, the coefficients of the Rate-Distortion model are updated 
based upon data from a plurality of previous frames. The number of previous 
1 5 frame used is preferably determined by a sliding window mechanism, wherein 
the value of the current window size W(n) is given by: 



Win) = mm{W(n -!) + !,<; (fl) * FF max } 



20 where W max is a preset constant; 

q (n) = min — — , — > . and 

[ ff(n) a(?2-l)J 

a(^) is the maximum absolute difference of the frame at time n. 

Such a sliding window mechanism smoothes the impact of scene changes, and 
25 changes the window size gradually. 

After the current frame is encoded, the total number of actual bits used to 
encode the current frame is added to the current buffer level. If the buffer is in 
danger of overflow, a switched frame skipping mechanism is preferably used to 
30 compute the number of skipped frames. 
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In one frame skip control, after the current frame is encoded, the next frame to 
be encoded will be skipped, if: 

B c {n + l) + T(n)-AZB M *y +T B (n) 

5 

where B c (n+1 ) is the current buffer level; 

T(n) is the actual number of bits used to encode the current frame; 
A is the channel output rate; 
B s is the buffer size; 
10 y is a pre-determined buffer safety margin; 

1 w w 1 W 

Win) % W{n) j~t 

and T(S0,n)) (1< j < W(n)) denotes the total number of actual bits 
1 5 generated in the encoding of the previous W(n) frames. 

In an alternative frame skip control, a frame skipping parameter N pos t is set to 
skip the next N pos t frames so that the following buffer condition is satisfied: 

20 B c (n + l)<rB s 

where 

B c (n + 1) = max{0, B c (n) + T(n) - A(N post + 1)} 

25 B c (n) is the buffer level at time n; 

T(n) is the actual number of bits used to encode the current frame; 

A is the channel output rate; 

B s is the buffer size; and 

Y is a pre-determined buffer safety margin. 

30 
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The first-mentioned skipping control is preferably provided as a predictive 
switching control, the second-mentioned skipping control is preferably provided 
as a post-frame skipping control, and the skipping controls are preferably 
switched between one another based on the following switching law: 
5 a) The predictive frame skipping control is switched to the post- 

skipping control if a frame is skipped; and 

b) The post-skipping control is switched to the predictive frame 
skipping control if the current frame is not skipped. 

1 0 The present invention also extends to a method for the encoding of a video 
sequence in accordance with the above system features, and to computer 
software for implementing the above system and method features. 

It further extends to the use of the above features independently of one another, 
15 with for example the Rate-Distortion model defined above being in itself a new 
and advantageous model for use in bit rate control. 

Brief Description of the Drawings 

20 

The present invention will hereinafter be described in greater detail by reference 
to the attached drawings which show an example form of the invention. It is to 
be understood that the particularity of the drawings does not supersede the 
generality of the preceding description of the invention. 

25 

Figure 1 is a, diagram of the structure of a typical network over which video 
streaming may be provided; and 

Figure 2 is a functional block diagram of a video encoder scheme according to 
30 an embodiment of the present invention. 
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Fig. 1 shows a typical Internet structure over which a video sequence may need 
to be transmitted from a source 1 to one or more receivers 2. Due to the 
5 amount of data in a video sequence, the data must be compressed, otherwise 
the required transmission bit-rate would be unachievably high. 

Thus, an encoders is provided at the source 1 in order to compress the video 
data, and decoders 4 are provided at the receivers 2 in order to decode the data 
10 and reconstruct the video sequence. In between the encoder 1 and decoders 4, 
the compressed data is routed through various servers 5 and over what may be 
many different types of transmission channel 6. 

Various different encoding systems have been provided for the compression of 
1 5 video data, and, for example, MPEG video compression is often employed. The . 
current MPEG standards are MPEG-1 and MPEG-2, which are similar in basic 
concept, and MPEG-4 which is able to provide a low-bandwidth multimedia 
format that can contain a mix of media (including recorded video images and 
sounds and their computer-generated counterparts), and uses the concept of 
20 "Video Objects" to transmit independent images of arbitrary shape. 

In MPEG compression, a video sequence is broken into a number of Groups of 
Pictures (GOP), each of which comprises a number of picture frames. Each 
frame is broken into a series of slices, and each slice consists of a set of 

25 macroblocks comprising arrays of luminance pixels and associated 

chrominance pixels. The macroblocks are divided into 8x8 blocks for encoding. 
Each block undergoes a Discrete Cosine Transform (DCT) to provide an array 
of DCT coefficients that are then quantized to force various of the coefficients 
(generally higher frequency coefficients) to zero so as to reduce the amount of 

30 data to be transmitted. Quantization is carried out by multiplying the DCT 

coefficient array by a quantization matrix, each value in the matrix being scaled 
by a quantization parameter. The matrix and quantization parameter can be 
altered on a frame-by-frame and/or block-by-block basis to alter the amount of 
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compression. The quantized coefficients then undergo further encoding to 
compress the transmission data still further. 



The frames in a GOP comprise an Intra-frame (I frame) that is spatially 
5 compressed (in accordance with the above method), and Inter-frames (P and/or 
B frames) that are also temporally compressed in a motion-compensated 
prediction manner. Thus, each P frame in a sequence is predicted from the 
frame immediately preceding it, and each B frame is predicted from preceding 
and succeeding frames. 

10 

MPEG-4 also includes a Video Object layer between the frame layer and 
macroblock layer for specifying different independent objects within a scene. 

In order to optimise video quality over a bit-rate range, e.g. in video-streaming 
15 to a number of receivers having different bandwidth capabilities, MPEG-4 also 
provides a Fine Granularity Scalability (FGS) scheme in which the coding of the 
video data is provided by a base layer and an enhancement layer, the base 
layer being designed to meet the lower bound of the bit rate range and the 
enhancement layer meeting the upper bound of the bit-rate range. The base 
20 layer is coded as discussed above, and the enhancement layer takes the 

original and reconstructed DCT coefficients of the base layer, and subtracts the 
reconstructed coefficients from the originals to provide a residue that is then 
encoded and transmitted with the base layer. The receivers of the data decode 
the base layer to provide a video signal based on the lowest bit rate range, and 
25 can improve the quality by decoding various amounts of the enhancement layer. 

The present invention relates to a bit rate control scheme for the compression of 
video data, and may for example be used in encoding the base layer of an FGS 
scheme. It may especially be used in the FGS disclosed in the co-pending 
30 International PCT patent application filed in Singapore on 25 May 2001 and 
entitled "A Fine Granularity Scalability Scheme". 



10 



15 
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The present bit-rate control scheme consists of three layers, namely the GOP 
layer, the frame layer and the video object layer. The whole scheme is shown 
in Fig. 2. 

The GOP layer rate control 1 is used to allocate bits to each GOP of the video 
sequence, each GOP being composed of one I frame and a number of P and B 
frames. 

The total number of bits available for the video sequence will be: 

TB = x R 

where is the duration of the video sequence; and 
R is the bit rate for the sequence. 

Assuming that the total number of I frames is TV, and that the number of P and B 
frames in the ith GOP are N PJ and N B i , and that the frames have weightings of 
W|, Wp and W B , then the number of bits allocated to the ith GOP is: 



20 



25 



N Pi W P +N Bi W s +W f 

TB — TB * 



For the sake of the present embodiment and for simplicity, it is assumed that 
each GOP has the same structure, and so the GOP Layer Rate Control will 
allocate each GOP the following number of bits: 



30 



After the GOP layer rate control at block 1 , the encoder carries out a buffer 
initialization at block 2, conducts the Intra-coding of the l-frame at block 3, 
updates a Rate-Distortion model at block 4 and checks as to whether the next 
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frame must be skipped at a skip-frame block 5 (e.g. because of possible buffer 
overrun). 

Inter-coding is then performed in which the encoder 3 performs a joint buffer 
5 control at block 6, a Frame Layer Target Bit Rate calculation at block 7 and a 
Quantization Parameter calculation at block 8, before carrying out the Inter- 
coding of the P or B frame at block 9. After encoding of the frame, the R-D 
model update and Frame-skip control are again carried out at blocks 4 and 5 
before conducting the encoding of the next inter-frame through block 6, etc. 

10 

Where the encoder scheme is used in the Video Object layer, the encoder also 
conducts a Target Bit Rate Allocation at block 10, and calculates a shape 
threshold in block 8 along with the quantization parameter calculation. 

15 The part of the bit rate control in the frame layer consists of three stages: the 
initialization, pre-encoding and post-encoding stages. 

(a) Initialization Stage 

20 In the initialization stage of block 2, the encoder carries out three main tasks 
with respect to the frame layer control, these being: 

(i) initialization of the buffer size based on latency requirements; 

(ii) subtraction of the bit count of the l-frame from the bit count of the 
ith GOP; and 

25 (iii) initialization of the buffer fullness - If the first GOP is encoded, 

then buffer fullness is set at 50% of a buffer safety margin (which will be 
40% of the buffer size assuming a safety margin of 80%). Otherwise, the 
buffer fullness is set at the end level of the previous GPO. 

30 The l-frame is quantized using an initial quantization value of Q 0 - The 

remaining available bits R 0 (i) for encoding all of the subsequent inter-frames 
can be calculated as: 



R 0 (i) = TB i -k, +(0.5*5, *f -4(0) 
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where TBi is the number of bits available to encode the ith group of frames; 
k z . is the number of bits used to encode the ith intra-frame; 
B s is the buffer size; 

5 f is the buffer safety margin for skipping frames, having a typical value 

of 0.8; and 

B c (i) is the buffer level at the start of encoding of the ith group of frames, 
with £ c (l) = 0.5*£ 5 *f . 

10 The channel output rate (the average number of bits to be drained from the 
buffer per frame encoding) is then R 0 (0 / N P i . 

(b) Pre-encoding Stage 

1 5 The pre-encoding stage includes setting a target bit rate for the encoding of the 
next video frame in the GOP, and setting the quantization parameter for 
quantization of the DCT coefficients in accordance with the target bit rate. 

When the number of bits in the buffer is too large (e.g. is predicted to exceed a 
20 safety margin), the encoder usually skips some frames to reduce the buffer 
delay and avoid buffer overflow. This however produces undesirable motion 
discontinuities in the encoded video sequence. Conversely, if the buffer level is 
too low, there may be periods of time in which no bits are transmitted through 
the channel, and channel bandwidth is wasted. 

25 

In order to overcome these problems, a frame level control is adopted which 
sets the target bit rate so as to attempt to maintain a buffer occupancy after the 
coding of each frame of about 50% of the buffer safety margin (i.e. about 40% 
of the buffer size for a 0.8 safety margin). 

30 

It should be noted that this differs from the prior art, which sets the target buffer 
fullness at the middle level of the buffer. The present scheme enables a low 
encoder buffer delay to be maintained and the total delay to be reduced. 
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In order to determine the target bit rate, the dynamics of the buffer are , 
represented by a fluid-flow traffic model with B c (n) denoting the buffer level at 
time n: 

5 B c (n + 1) = max{0, B c (*) + T(n) - u(n)} (1 ) 



where T(n) is the actual encoding bit rate; and 
u(n) is the channel output rate. 



1 0 Using equation (1 ) and linear system control theory (see for example Chi-Tsong 
Chen, "Linear system theory and design", Rinehard and Winston, New York, 
1 984), the target bit rate is scaled based on the buffer size B s , the current buffer 

A 

level B c (n) and the channel output rate ^ Q (i)/ N Pi , and is given by: 



15 



/ (n) = max< 



0 5 #- + (l-y)^^- + (Y -l)B c (n) 



N 



where 0 <, y < 1 is an adjustable parameter having a typical value of 0.75. 

When calculating the bit rate for the frame, the number of remaining bits T r 
20 allocated to the current GOP and the remaining number of frames N r of the 
current GOP should also be taken into account to ensure that there are 
available bits for the remaining frames, and so the final frame bit rate is: 



25 



30 



f(n) = max {p * ^ + (1 - p ) * f(n), ^- + H hdr {n - 1)} 



where 0 < p < 1 is an adjustable parameter having a typical value of 
0.585; and 

Hhdr(n-1) is the amount of bits used for overhead data, that is, the 
bits used for non-texture data, e.g. shape information, motion 
vector information and header information. 
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It should be noted that the above method of using a fluid-flow model departs 
from the prior art use of heuristic methods for determining the target bit rate, 
and enables the buffer occupancy to be kept much closer to the target, so that 
5 fewer frames are skipped. 

The present model-based method may be used in any suitable video 
transmission system, and is especially attractive when MPEG-4 video is 
transported over the Internet where variations in bandwidth occur. Using the 
1 0 heuristic approach, adjustment of the joint buffer control has a delay of one 
step, and cannot adapt itself in time to the variations in channel bandwidth. 
However, with the present model-based method, when the channel bandwidth 

is time-varying, the term R Q (i)/N pi may be replaced by the estimated actual 

channel bandwidth, e.g. by using the packet loss information. Thus, the 
1 5 variation of the channel bandwidth can be incorporated into the present joint 
buffer control, and the scheme can adapt itself in time. 

A further point to note is that the receiver synchronization of a continuous media 
stream must deal with delay differences and variations. Since the present 
20 frame-layer control keeps the buffer occupancy much closer to the target (50% 
of the safety margin (40% of the buffer size)), the playout buffer delay can be 
reduced, and so the total delay is further reduced. 

Once the target bit rate is determined, the corresponding quantization 
25 parameter, Q, can be computed by using a Rate-Distortion model, which takes 
the form of the following quadratic model: 



30 



where R is the total number of bits used to encode a frame; 
Q is the quantization parameter; 
ci and c 2 are first and second order coefficients; 
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a is the mean absolute difference of texture computed using the 
motion-compensated residual for the luminance component (an 
index of video coding complexity); and 

Hhdr is the amount of bits used for overhead data, that is, non- 
5 texture data, e.g. video/frame syntax, bits used for shape information, 

motion vector information and header information. 

(c) Post-encoding Stage 

10 

The post-encoding stage includes the processes of updating the parameters ci 
and c 2 of the Rate-Distortion model and determining whether any frame- 
skipping is necessary to prevent possible buffer overflow. 

1 5 The statistics of quantization parameter value and bit rate value, taken from a 
number of previously encoded frames including the immediately preceding 
frame, are used to provide improved parameters Ci and C2 for the R-D model by 
using a linear regression technique. 

20 The number of frames to use is based on a sliding window mechanism, which is 
designed to smooth the impact that a scene change might have in the updating 
of the R-D model. 

If the complexity changes significantly, i.e. in high motion scenes, a smaller 
25 window with more recent data points after the change is used. Otherwise, a 
window with more data points is used. To ensure that the window size is not 
varied too rapidly, the window size is increased gradually. 

Thus, the value of the current window size W(n) is given by: 

30 

Win) = min{JF(/i - 1) + lq in) * Max _ Sliding _ Window} 

where Max__SlidingJ/Vindow is a preset constant, and may be set to e.g. 
20; and 
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[ cj(n) a(n-l)J 



The selected sample data points within the window W(n) are denoted as S(j,n) 
(1<j<W(n)). 

For the selected data points, the encoder collects the quantization parameter 
statistics Q(j) and the actual bit rate statistics T(j), and, using a linear regression 
technique, the parameters can be obtained by: 



10 



C, = 



(T(SU,n))-H hdr (S(J,n))) 
W(n) 



Q(S(j,n)) <s(S(j,n)) i 
cr(5(y,»)) Q(SU,nJ) 1 



15 



c 3 = W{n) £ (T(S(J, «)) - H hdr (S(J, "))) 



C4= 'f(r (w ,»))-^(W,»)))^^f^f 

j^i a (S(j, ri)) Q(S(i 9 n)) 



After updating the R-D model, the total number of actual bits T(n) used to 
20 encode the current frame is added to the current buffer level, and a switched 
frame skip control is performed to prevent buffer overflow and overcome 
continuous frame skipping. The switched frame skipping control is composed of 
two basic controllers (a predictive frame skipping controller and a post frame 
skipping controller) and a corresponding switching law to determine the active 
25 controller. 
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In the predictive frame skip controller, a function T B is defined by: 



1 W{n) i W{n) 

Win) j~t Win) % 

5 where T(S(j,n)) (1< j < W(n) denotes the total number of actual bits 

generated in the encoding of the previous W(n) frames. 

The next frame to be encoded will be skipped, if the current buffer level plus the 
estimated number of bits for the next frame is larger than the sum of T B (n) and 
10 some pre-determined threshold, called the safety margin, that is if: 

B c {n + 1) + Tin) -A>B s *y+T B in) 

where B c (n+1 ) is the current buffer level; 
1 5 T(n) is the actual number of bits used to encode the current frame; 

A 

A is the channel output rate (which may be B 0 (i)/ N PJ or is 

replaced by the estimated actual channel bandwidth); 
B s is the buffer size; and 
y is the pre-determined safety margin. 

20 

If skipping takes place, the current buffer level is reduced by the channel output 
rate. 

In the post frame skipping controller, a frame skipping parameter N pos t is 
25 increased from zero until the following buffer condition is satisfied, the next N pos t 
frames are then skipped by the encoder: 

B c (n + \)<rB s 

where 

30 

B c (n + 1) = max{0, B c («) + T(n) - A{N post + 1)} . 



WO 02/096120 



18 



PCT/SG01/00105 



The predictive frame skipping control is initially used, and the switching law is: 



a) The predictive frame skipping control is switched to the post- 
5 skipping control if a frame is skipped; and 

b) The post-skipping control is switched to the predictive frame 
skipping control if the current frame is not skipped. 

Instead of using the switched frame-skipping control, the predictive or post 
1 0 frame skipping control may be used by itself. 

Besides using the present method on the frame layer rate control, the above 
method may also be used to control the video object layer rate control. 



15 In the video object rate control, the total target bit rate (as found in the frame 
layer control) is allocated to each video object according to its coding 
complexity, size and perceptual importance. Thus, for a given target bit rate, 
the target bit rate for an object i is given by: 



20 f t (») = {f{n) - H hdr (n - 1)) * * a f 



t £ . {mot Ux oo + mot* (*))+ q-x) P< 



where pi is the size of the video object i; 
25 Hntn-l^J^H^in-l) ; 

MOTijx(n) and MOTi jy (n) are the absolute values of the jth motion 
vector component within the object i at the time n; and 
x is an adjustable parameter 0 < % < 1 . 

30 Also, to avoid using excessive bits for motion and shape information instead of 
for texture, and to balance the bit usage without imposing additional noticeable 
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distortion, the shape threshold values can be set dynamically based on the 
previous coding information. 

In the adaptive threshold shape control, let 

5 



10 



1 *i {*-*)( 1 WUn-\) 



v2 



V 



The threshold for the video object i, 9 if is initially set to zero, if fj(n) is less than 
H h dr,i(n-1 ) - 1 .25 H B (i) in the previous frame, then: 



where 9 s te P (i) > 0 and 9 max (i) > 0 are predefined. 
15 If fj(n) is greater than H h dr,i(n-1 ) + 1 .25H B (i), then it is decreased by: 

e.^max^o^e.-e^co}- 

Otherwise, the threshold is not changed. 

20 

When controlling the video object layer, the switched frame skipping control will 
preferably be used. 

Besides controlling the frame layer bit rate and the video object layer, the 
25 present scheme can also control the macroblock layer control. The method is 
thus scalable. 

It is to be understood that various alterations additions and/or modifications may 
be made to the parts previously described without departing from the ambit of 
30 the invention, and that, in the light of the teachings of the present invention, the 
control scheme may be implement in software and/or hardware in a variety of 
manners. 
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1 . A bit rate control system for the encoding of a video sequence in which 
encoded data is placed in a buffer prior to transmission, and in which a target 

5 encoding bit rate is determined based on the fullness of the buffer, 

characterised in that the buffer is modelled on a fluid-flow traffic model. 

2. The system of claim 1 , wherein said the fluid-flow traffic model is of the 
form: 



10 



B c (n + 1) = max{0 ? B c (n) + T(n) - u(ri)} 



where B c (n) denotes the buffer level at time n; 
T(n) is the actual encoding bit rate; and 
15 u(n) is the channel output rate. 

3. The system of claim 1 , in which a rate-distortion model, used to compute 
a quantization parameter for the control system, has the form: 



n a 2 a rr 
20 2 g r + <?1 fi 



where R is the total number of bits used to encode a frame; 
Q is the quantization parameter; 
ci and c 2 are first and second order coefficients; 
25 a is an index of video coding complexity; and 

Hhdr is the amount of overhead bits used, 

4. The bit rate control system of claim 1 , 2 or 3, wherein a buffer occupancy 
target is set at a predefined percentage of a safety margin, said safety margin 
30 being used to determine whether a frame of the video sequence to be encoded 
should be skipped. 
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5. The bit rate control system of claim 4, wherein said buffer target 
occupancy is set to about 50% of said safety margin. 

6. The bit rate control system of any preceding claim, wherein said target 
5 encoding bit rate is given by the equation: 



7(«) = ^ + (l-y) I ^- + (y-l)5 e 



where A is the channel output rate; 
10 y is a buffer safety margin; i 

B s is the buffer size; 
B c (n) is the current buffer level; and 
0 < y < 1 is an adjustable parameter. 

15 7. The bit rate control system of claim 6, wherein A is equal to the number 
of bits available for encoding all of the inter-frames of a current group of frames 
being encoded divided by the number of inter-frames to be encoded in the 
current group of frames. 

20 8. The system of claim 7, wherein the available bits R 0 (i) for encoding the 
inter-frames of the ith group of frames is: 

R Q (0 = TB % -k , + (0.5 * B s *y - B c (i)) 

25 where TBj is the number of bits available to encode the ith group of 

frames; 

k. is the number of bits used to encode the ith intra-frame; 

B s is the buffer size; 

y is the buffer safety margin; and 

30 B c (i) is the buffer level at the start of encoding the ith group of 

frames. 
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9. The bit rate control system of claim 6, wherein A is the estimated actual 
channel bandwidth. 

10. The system of any preceding claim, wherein the target bit rate is modified 
based on the remaining bits available for encoding and on the remaining frames 
to be encoded. 

1 1 . The system of claim 1 0, wherein the target bit rate is: 



1 0 fin) = max {(3 * ^ + (1 ~ |3 ) * f{n\ ^- + H Mr (n - 1)} 

N r 3N r 



where 0 < p < 1 is an adjustable parameter; 

T r is the number of remaining bits available for encoding; 
N r is the number of frames remaining to be encoded; and 
1 5 H hc jr(n-1 ) is the amount of overhead bits used for the previous 

frame. 

12. The system of any preceding claim, wherein the bit rate control uses a 
rate-distortion model to determine the quantization parameter for a frame to be 
20 encoded, and wherein the coefficients of said model are updated based upon 
data from a plurality of previous frames, the number of previous frame used 
being determined by a sliding window mechanism, wherein the value of the 
current window size W(n) is given by: 

25 W(n) = mm{W(n + (n) * W max } 



where W max is a preset constant; and 

. fa(w-l) a(n) 
q (n) = min^ — — , — - 



30 13. The system of any preceding claim, wherein after the current frame is 
encoded, the next frame to be encoded will be skipped, if: 
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B c (n + Y) + T<in)-AZB s *y+T B (n) 

where B c (n+1) is the current buffer level; 
5 T(n) is the actual number of bits used to encode the current frame; 

A is the channel output rate; 
B s is the buffer size; 

Y is a pre-determined buffer safety margin; and 

i W{n) i W(n) 

where T(SQ,n)) (1 < j < W(n)) denotes the total number of actual 
bits generated in the encoding of the previous W(n) frames. 

15 14. The system of any of claims 1 to 13, wherein after the current frame is 
encoded, the total number of actual bits used to encode the current frame is 
added to the current buffer level, and wherein a frame skipping parameter N pos t 
is set to skip the next N pos t frames so that the following buffer condition is 
satisfied: 

20 B c (n + l)<fB s 

where 

B c (n + 1) = max {0 5 B c (n) + T(n) - A{N post + 1)} 

25 where B c (n) is the buffer level at time n; 

T(n) is the actual number of bits used to encode the current frame; 
A is the channel output rate; 
B s is the buffer size; and 

f is a pre-determined buffer safety margin. 

30 
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15. The system of claims 13 and 14, wherein the skipping control of claim 13 
is provided as a predictive switching control, the skipping control of claim 14 is 
provided as a post-frame skipping control, and the skipping controls are 
switched between one another based on a switching law, said switching law 

5 being: 

a) The predictive frame skipping control is switched to the post- 
skipping control if a frame is skipped; and 

b) The post-frame skipping control is switched to the predictive frame 
skipping control if the current frame is not skipped. 

10 

16. A method for encoding a video sequence, including the step of placing 
encoded data into a buffer prior to transmission, and the step of determining a 
target encoding bit rate based on the fullness of the buffer, characterised by the 
step of modelling the buffer based on a fluid-flow traffic model. 

15 

17. The method of claim 1 6, wherein the fluid-flow model is of the form: 

B c (n + l) = max{0, B c (n) + T(n) - u(n)} 



20 where B c (n) denotes the buffer level at time n; 

T(n) is the actual encoding bit rate; and 
u(n) is the channel output rate. 

18, The method of claim 16 or 17, including the step of determining a 
25 quantization parameter for encoding the data based on a rate-distortion 
equation having the form: 

a 2 a 



where R is the total number of bits used to encode a frame; 
30 Q is the quantization parameter; 

Ci and o 2 are first and second order coefficients; 
a is an index of video coding complexity; and 
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Hhdr is the amount of overhead bits used. 
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1 9. Computer software for the encoding of a video sequence, wherein 
encoded data is placed in a buffer prior to its transmission, and wherein the 
5 computer software includes a component which determines a target encoding 
bit rate based on the fullness of the buffer, characterised in that the software 
includes a component for modelling the buffer based on a fluid-flow traffic 
model. 



10 20. The software of claim 1 9, including a component for determining a 

quantization parameter for encoding the data based on a rate-distortion 

i 

equation haying the form: 

a 2 a 



1 5 where R is the total number of bits used to encode a frame; 

Q is the quantization parameter; 
ci and 02 are first and second order coefficients; 
o- is an index of video coding complexity; and 
Hhdr is the amount of overhead bits used. 



20 



25 



21 . A bit rate control system for the encoding of video data, wherein a rate- 
distortion model is used to determine a quantization parameter to use in the 
encoding, and characterised in that the rate-distortion model has the form: 



a 2 a TT 



where R is the total number of bits used to encode a frame; 
Q is the quantization parameter; 
Ci and C2 are first and second order coefficients; 
30 a is an index of video coding complexity; and 

Hhdr is the amount of overhead bits used. 
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22. A bit rate control system for the encoding of a video sequence in which 
encoded data is placed in a buffer prior to its transmission, and in which a target 
encoding bit rate is determined based on the fullness of the buffer, 

5 characterised in that a buffer occupancy target is set at a set percentage of a 
safety margin, said safety margin being used to determine whether a frame of 
the video sequence to be encoded should be skipped. 

23. The bit rate control system of claim 22, wherein said buffer target 
10 occupancy is set to about 50% of said safety margin. 

i 

24. A bit rate control system for the encoding of a video sequence in which 
encoded data is placed in a buffer prior to its transmission, and in which a target 
encoding bit rate is determined based on the fullness of the buffer, the bit rate 

15 control' using a rate-distortion model to determine a quantization parameter for a 
frame to be encoded, and wherein the coefficients of said model are updated 
based upon data from a plurality of previous frames, the number of previous 
frame used being determined by a sliding window mechanism, characterised in 
that the value of the current window size W(n) is given by: 

20 

Win) = {mm W(n - 1) + 1, q (») * W max } 



25 



where W max is a preset constant; and 



q (n) = min< 



a(n) 9 (j(n -1) 
a(n) being an index of video coding complexity. 



25. A bit rate control system for the encoding of a video sequence in which 
encoded data is placed in a buffer prior to its transmission, and in which video 
30 data to be encoded is skipped if it is determined that buffer overflow may occur, 
characterised in that said skip control comprises: 
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a) a predictive skip control, in which, after the current frame is 
encoded, the next frame to be encoded will be skipped, if: 

B e (n + l) + T(n)-A±B M *y +T B (n) 

5 

where B c (n+1 ) is the current buffer level; 

T(n) is the actual number of bits used to encode the current frame; 
A is the channel output rate; 
B s is the buffer size; 
10 y is a pre-determined buffer safety margin; and 



1 W(n)( i Win) \ 



J 



where T(SQ\n)) (1< j < W s (n)) denotes the total number of actual 
15 bits generated in the encoding of the previous W(n) frames; 

b) a post frame skip control in which after the current frame is 
encoded, the total number of actual bits used to encode the current frame is 
added to the current buffer level, and wherein a frame skipping parameter N pos t 
20 is set to skip the next N pos t frames so that the following buffer condition is 
satisfied: 

B c {n + \)<fB s 

25 where 

B c (* + !) = max {o, B c (a) + T(n) - A(N post + 1)} 



30 



where B c (n) is the buffer level at time n; 

T(n) is the actual number of bits used to encode the current frame; 
A is the channel output rate; 
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B s is the buffer size; and 

y is a pre-determined buffer safety margin. 

c) a switching law through which the skipping controls are switched 
5 between one another, said switching law being: 

a) The predictive frame skipping control is switched to the post- 
skipping control if a frame is skipped; and 

b) The post-frame skipping control is switched to the predictive frame 
skipping control if the current frame is not skipped. 
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